Recursive Data Mining for Author and Role Identification
نویسندگان
چکیده
Like paintings and verbal dialogues, written documents exhibit the author’s distinctive style and identification of the author of an anonymous document is an important and challenging task in computer security. Even more challenging is identification of a style of a group of diverse individuals acting in similar circumstances, like authors writing in certain literary period or people writing in a certain social role. The last application is important for analyzing hidden group communicating over the internet in which neither identities nor roles of the members are known. Other applications of the identification of such styles include fraud detection, author attribution and user profiling. The task of finding distinctive features of an artifact has much broader scientific implications that range from art and scriptures to network security. In this paper, we focus on capturing patterns in electronic documents. The approach involves discovering patterns at varying degrees of abstraction, in a hierarchical fashion. The discovered patterns capture the stylistic characteristics of either the author, or a group of authors, or even of the specific role that the author plays in relation to others. These patterns are used as features to build efficient classifiers. Due to the nature of the pattern discovery process, we call our approach Recursive Data Mining. The patterns discovered allow for certain degree of approximation, which is necessary for capturing non-trivial patterns on realistic datasets. Experiments on the Enron and SEA datasets, the former categorizes members into organizational roles and the latter categorizes a set of computer sessions, are conducted to substantiate our methodology. The results show that a classifier that uses the dominant patterns discovered by Recursive Data Mining performs better than the same classifier without features based on RDM patterns, in role detection and author identification.
منابع مشابه
Recursive data mining for role identification in electronic communications
We present a text mining approach that discovers patterns at varying degrees of abstraction in a hierarchical fashion. The approach allows for certain degree of approximation in matching patterns, which is necessary to capture non-trivial features in realistic datasets. Due to its nature, we call this approach Recursive Data Mining (RDM). We demonstrate a novel application of RDM to role identi...
متن کاملPredicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques
Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...
متن کاملModelling Customer Attraction Prediction in Customer Relation Management using Decision Tree: A Data Mining Approach
In Today’s quality- based competitive world, known as knowledge age, customer attraction is of ultimate importance. In respect to the slogan “customer is always right”, customer relation management is the core of an organizational strategy playing an important role in four aspects of customer identification, customer attraction, customer retaining, and customer satisfaction. Commercial organiza...
متن کاملData Mining for Identification of Forkhead Box O (FOXO3a) in Different Organisms Using Nucleotide and Tandem Repeat Sequences
Background: Deregulation of FOXO3a gene which belongs to Forkhead box O (FOXO) transcription factors, can cause cancer (e.g. breast cancer). FOXO factors have important role in ubiquitination, acetylation, de-acetylation, protein-protein interactions and phosphorylation. Understanding the regulation and mechanisms of FOXO3a can lead to cancer treatment. The aim of this study recent association...
متن کاملDynamic segmentation and ranking approach of customers and identifying their behavioral mobility using data mining techniques in Kargaran Welfare Bank
Nowadays, identifying, determining the value and segmentation of customers is essential for a bank. Dynamic classification of workers' welfare bank customers and identification of their behavioral mobility between different departments in a specific period of time using data techniques Kaveh. In this regard, transaction data of customers of this bank was considered as a statistical community. I...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008